Converting between data frames and contingency tables
Problem
You want to do convert between a data frame of cases, a data frame of counts of each type of case, and a contingency table.
Solution
Suppose we start with this data frame, in which each row represents one case:
cases <- data.frame(Sex=c("M", "M", "F", "F", "F"), Color=c("brown", "blue", "brown", "brown", "brown")) # Sex Color # M brown # M blue # F brown # F brown # F brown
It can also be represented as a contingency table. Note that it's converted here and stored in ctable
:
# Cases to Table ctable <- table(cases) # Color # Sex blue brown # F 0 3 # M 1 1 # If you call table using two vectors, it will not add names (Sex and Color) to the dimensions table(cases$Sex, cases$Color) # blue brown # F 0 3 # M 1 1 # The dimension names can be specified manually, or by using a subset of the data frame that # contains only the desired columns table(cases$Sex, cases$Color, dnn=c("Sex","Color")) table(cases[,c("Sex","Color")]) # Color # Sex blue brown # F 0 3 # M 1 1
It can also be represented as a data frame of counts of each combination. Note that it's converted here and stored in countdf
:
# Cases to Counts countdf <- as.data.frame(table(cases), stringsAsFactors=TRUE) # Sex Color Freq # F blue 0 # M blue 1 # F brown 3 # M brown 1
These three data structures represent the same information, but in different ways. Here are other ways of converting between them. Some of these require a function expand.dft()
, which is defined below.
Converting from a contingency table to the other two formats:
# Table to Counts as.data.frame(ctable, stringsAsFactors=TRUE) # Table to Cases expand.dft(as.data.frame(ctable, stringsAsFactors=T))
Converting from a data frame of counts to the other two formats:
# Counts to Cases expand.dft(countdf) # Counts to Table xtabs(Freq ~ x+y, data=countdf)
expand.dft() function
expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".") { # Take each row in the source data frame table and replicate it # using the Freq value DF <- sapply(1:nrow(x), function(i) x[rep(i, each = x$Freq[i]), ], simplify = FALSE) # Take the above list and rbind it to create a single DF # Also subset the result to eliminate the Freq column DF <- subset(do.call("rbind", DF), select = -Freq) # Now apply type.convert to the character coerced factor columns # to facilitate data type selection for each column for (i in 1:ncol(DF)) { DF[[i]] <- type.convert(as.character(DF[[i]]), na.strings = na.strings, as.is = as.is, dec = dec) } DF }
This function was written by Marc Schwartz.